add weight_sum_fp32 config #2519

suexu1025 · 2025-10-17T18:08:33Z

Description

Add config flag weight_sum_fp32 for whether to use full fp32 precision for weight_sum during final unpermute in moe

Tests

final eval loss at 300 steps
2.394 (cloudlog)(https://cloudlogging.app.goo.gl/Q5o2tac9aypGGMyV6)
2.393 (cloudlog)(https://cloudlogging.app.goo.gl/L2N43dAZiHap1Djk7)

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

src/MaxText/layers/moe.py

src/MaxText/configs/base.yml

src/MaxText/layers/moe.py

github-actions · 2025-10-31T19:18:11Z

🤖 Hi @RissyRan, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions

📋 Review Summary

This pull request introduces a new configuration flag, float32_weight_sum, to control the precision of the weighted sum operation in the Mixture of Experts (MoE) layers. The changes are well-implemented and provide useful flexibility for balancing performance and numerical precision.

🔍 General Feedback

The addition of the float32_weight_sum flag is a good feature for optimizing MoE layers.
The implementation in src/MaxText/layers/moe.py correctly applies the conditional casting based on the new configuration.
A minor style suggestion was made to improve comment consistency.

github-actions · 2025-10-31T19:19:37Z

src/MaxText/configs/base.yml

 cast_logits_to_fp32: True # whether to cast the logits to fp32. The higher precision is generally beneficial, but it can vary slightly.
 float32_qk_product: False # in dot_product attention, whether to cast to fp32 the inputs to qk product
 float32_logits: False # in dot_product attention, whether to cast to fp32 the inputs to softmax
+float32_weight_sum: True # whether to use full fp32 precision for weight_sum during final unpermute in moe


🟢 Nit: For consistency and clarity, it's better to use "MoE" instead of "moe".

Suggested change

float32_weight_sum: True # whether to use full fp32 precision for weight_sum during final unpermute in moe

float32_weight_sum: True # whether to use full fp32 precision for weight_sum during final unpermute in MoE

update update Update base.yml Update moe.py Update moe.py

suexu1025 requested review from A9isha, NuojCheng, RissyRan, SurbhiJainUSC, aireenmei, bvandermoon, gagika, gobbleturk, hengtaoguo, jiangjy1982, khatwanimohit, michelle-yooh, richjames0, shralex and vipannalla as code owners October 17, 2025 18:08

RissyRan reviewed Oct 17, 2025

View reviewed changes

src/MaxText/layers/moe.py Show resolved Hide resolved

suexu1025 requested a review from RissyRan October 17, 2025 18:54

khatwanimohit approved these changes Oct 31, 2025

View reviewed changes

RissyRan reviewed Oct 31, 2025

View reviewed changes

src/MaxText/configs/base.yml Outdated Show resolved Hide resolved

src/MaxText/layers/moe.py Outdated Show resolved Hide resolved

suexu1025 requested a review from RissyRan October 31, 2025 19:01

RissyRan approved these changes Oct 31, 2025

View reviewed changes

RissyRan added the gemini-review label Oct 31, 2025

github-actions bot reviewed Oct 31, 2025

View reviewed changes

add weight_sum_fp32 config

ee4e8cc

update update Update base.yml Update moe.py Update moe.py

suexu1025 force-pushed the qinwen/add_up_quantize_config branch from 6f0349d to ee4e8cc Compare October 31, 2025 23:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add weight_sum_fp32 config #2519

add weight_sum_fp32 config #2519

suexu1025 commented Oct 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Oct 31, 2025

Uh oh!

github-actions bot left a comment

Uh oh!

github-actions bot Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	float32_weight_sum: True # whether to use full fp32 precision for weight_sum during final unpermute in moe
	float32_weight_sum: True # whether to use full fp32 precision for weight_sum during final unpermute in MoE

add weight_sum_fp32 config #2519

Are you sure you want to change the base?

add weight_sum_fp32 config #2519

Conversation

suexu1025 commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Oct 31, 2025

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

📋 Review Summary

🔍 General Feedback

Uh oh!

github-actions bot Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

suexu1025 commented Oct 17, 2025 •

edited

Loading